A new axiomatic approach to diversity 



Chris Dowden 
LIX, Ecole Poly technique, 91128 Palaiseau Cedex, Prance 



Abstract 

The topic of diversity is an interesting subject, both as a purely mathematical 
concept and also for its applications to important real-life situations. Unfor- 
tunately, although the meaning of diversity seems intuitively clear, no precise 
mathematical definition exists. In this paper, we adopt an axiomatic approach 
to the problem, and attempt to produce a satisfactory measure. 
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1. Introduction 

Over the last twenty years, an important problem in conservation biology 
has been how best to measure the 'diversity' of a set of species. This is because 
diversity has emerged as a leading criterion when prioritising species to be saved 
from extinction. The topic also has applications in a wide number of other 
fields, such as linguistics and economics, but in this paper we examine it as a 
mathematical concept. 

There are two distinct challenges. The first is how to accurately evaluate 
the diversity of any two elements (e.g. species), and the second is how to then 
use these pairwise-diversities, or 'distances', to produce scores for sets of size 
greater than two. It is the latter, more mathematically interesting problem, 
that we address here. 

Biologists and economists have produced numerous papers (see [l|-[l3| and 
the references therein) investigating various different measures. Some of these 
are very simple 'rule of thumb' methods (e.g. minimum distance maximum 
distance 0, average distance [l^l, total distance Q), while others are more 
elaborate (e.g. phylogenetic diversity which we shall shortly discuss, or p- 
median [5]). However, each of these is known to be imperfect, in that they 
sometimes rank sets in a counter-intuitive order. 

The most popular method seems to be phylogenetic diversity (Q)- Given 
the tree-like structure of evolutionary relationships, phylogenetic diversity was 
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developed for the specialised case when the pairwise-diversities induce a tree- 
metric, with the score of a set of organisms being defined to be the length of 
the minimal subtree connecting them. For example, given the tree shown in 
Figure [TJ the sets {u,v,w} and {u,v,x} score 14 and 22, respectively, and so 




X y 



Figure 1; An example of phylogenetic diversity. 

the latter would be considered as the more diverse. 

One problem with phylogenetic diversity is that, in practice, the pairwise- 
diversities will often not induce a tree-metric. Moreover, even with a tree-metric, 
sets can still sometimes get ranked in an undesirable order. For example, in 
Figure[T]the set {u, x, y} would score more than the set {u, w, y} (20 compared 
to 19), even though w is very different from both u and y, while x is very similar 
to y. Indeed, adding w to the set {u, y} does not increase the phylogenetic 
diversity score at all, which seems strongly counter-intuitive! 

It is the object of this paper to introduce a new axiomatic approach to di- 
versity, in an attempt to produce a measure that never disagrees with intuition 
(furthermore, we shall only assume that the pairwise-diversities satisfy the prop- 
erties of a metric, and not necessarily a tree- metric). Such an idea has already 
been discussed in [§] and 0], but the axioms suggested were not completely 
satisfactory, and hence resulted in some undesirable measures being accepted 
(for example, it is shown that the unique way to satisfy the axioms of ^ is to 
rank sets according to their maximum distance). 

To avoid confusion, we should mention in passing that there are a number 
of papers (see, for example, 13 1 or 01) in which the term 'diversity' is used to 



mean the total number of features contained by a set (e.g. the number of different 
books possessed by a set of libraries). Also, there is an unrelated section of the 
literature (see, for example, [1] or [ij) which uses the term to refer to a type of 
entropy. In most situations, these definitions are not consistent with the notion 
of diversity explored in this paper (although there have been some attempts to 
unify the different approaches, see e.g. Q or (lo|). 

Finally, it is worth remarking that, even without the biological motivation, 
the question under discussion in this paper seems very natural — given two 
collections of points in a metric space, which is the more spread out? It seems 
surprising that the topic has never previously been investigated by mathemati- 
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The remainder of the paper is divided into four main sections. In the first, 
Section [2l we propose four basic axioms that any sensible diversity measure 
should satisfy. In Section [31 we then present some measures that fulfill all these 
requirements (the first such measures ever to be produced). In Section [U we 
discuss one further property that a perfect diversity measure should be expected 
to satisfy, and finally, in Section [SJ we present a new measure that seems to have 
all these properties. 

2. Axioms 

Throughout the rest of this paper, we shall assume that we are given a 
complete weighted graph, where the edge- weights denote the pairwise-diversities 
of the vertices (this is slightly different from the tree-like structure of Figure [T|) . 
We will henceforth refer to these pairwise-diversities as 'distances', and we shall 
assume that they satisfy the properties of a metric. Our aim is to construct a 
way to use these distances to give a score for the overall diversity of any subset 
of our collection of vertices. To that end, we will spend this section proposing 
four axioms that any satisfactory diversity measure, Z?, should satisfy. 

We start with three properties that are hoped to be intuitively natural: 

Axiom 1. For any non-empty set of vertices S, we have D{S U {x}) > D(S) 
for all X, with equality if and only if x €z S. 

Axiom 2. For any two vertex-sets S = {si, S2, ■ ■ ■ , Sn} and T = {ti, t2, ■ ■ ■ , tn} 
satisfying D{{ti,tj}) > D{{si, Sj}) for all i and j , we have D{T) > D{S), with 
equality if and only if D{{ti,tj}) — D{{si, Sj}) for all i and j . 

Axioms. Continuity. Given any set of vertices S = {si, S2, . . . , Sn} cind 
any e > 0, there exists a (5(5*, e) > such that, for any set of vertices T = 
{ti, ■ • • , ^n} satisfying \D{{ti^tj}) ~ D{{si, s.j})\ < 5 for all i and j , we have 
\D{T)-D{S)\ < e. 

It is worth observing that two other desirable properties follow automatically 
from these axioms. First, note that Axiom [1] implies D{{x}) = D{{x,x}), and 
hence: 

Corollary 1. D{{x}) — for all x. 

Secondly, it follows from applying Axiom [3] to the sets S — {si, S2, ■ ■ ■ , s„, s„} 
and T = {si, S2, ■ ■ ■ , Sn,x} (and using the triangle inequality) that we have 
continuity when adding a new vertex: 

Corollary 2. Given a set of vertices S — {si, S2, . . . , s„} and an e > 0, there 
exists a 6{S,e) > such that, for any vertex x satisfying D{{x, Sn}) < 5, we 
have D{SU{x}) < D{S)+e. 
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Our fourth axiom is motivated by the principle that consistent results ought 
to be obtained regardless of differences in the scale used to measure the original 
distances. For example, if we wish to compare the diversity of the locations 
of stars in two different galaxies, then the resultant ranking should not depend 
on whether the distances were measured in hght-years or kilometres. In other 
words, multiplying all our original distances by some constant c should not affect 
whether or not D{S) > D{T) for any sets S and T: 

Axiom 4. Scaling. Given four sets of vertices S = {si, §2, . . . , s,i}, S' = 
{si,s'2,...,<}, T = {ii,i2,...,i4 andV = {t\,t'^, . . . ,t'^) , if D{{s\,s';\) = 
cZ?({si,Sj}) for all i and j and D{{t'j^,tj}) — cD({ti,tj}) for all i and j, for 
some constant c > 0, then D{S') > D{T') if and only if D{S) > D{T). 

By considering the case when |T| — 2, this implies the following: 

Corollary 3. Given two sets S — {si, S2, . . . , s„} and S' ~ {s[, s'2, . . . , s'„}, 
if D{{s^,s'j}) — cD({si, Sj}) for all i and j, for some constant c > 0, then 
D{S')=cD{S). 

There is also an 'equidistance' axiom that we will discuss (extensively) in 
Section m 

Note that it is implicit in our whole approach that the diversity score ought 
to be a function only of the distances. While this is certainly sensible if the 
vertices represent points in Euclidean space, since the set of distances uniquely 
defines the set of points (up to translations, rotations and reflections), it is 
perhaps less clear for some other scenarios. 

For example, consider the points in {0,1}* shown in Table [T] and suppose 



Points 


Co-ordinates 


u 


10 


V 


10 


w 


10 


X 


1 


y 


1110 



Table 1: A selection of points on a four-dimensional cube. 

that we wish to choose one new vertex, either x or y, to add to the set {u, v, w}. 
Since x and y both have (Hamming) distance exactly 2 from all the other points, 
it is automatic with our approach that they would be considered as equally 
good, i.e. the diversity of the set {u, v, w, x} would equal the diversity of the 
set {u,v,w,y}. However, since these two sets have fundamental differences 
(one is four-dimensional and one is three-dimensional), it could be argued that 
forcing their diversity scores to be equal is, while hardly ridiculous, rather heavy- 
handed. Nevertheless, we shall turn a blind eye to such objections, as it is 
difficult to know how to proceed otherwise. 
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3. New diversity measures 

In the previous section, we proposed four axioms that any diversity measure 
should satisfy. Although the problem seems fairly natural, it is surprisingly 
difficult to construct a measure that fulfills all these requirements (indeed, every 
method suggested in the existing literature fails either Axiom [T] or Axiom [5]). 
However, in this section we shall now present a simple system for obtaining 
measures that do satisfy all four axioms. This is not the end of the story, though, 
and in Section 2] we shall argue that these measures are still not satisfactory. 

Given a real- valued function /, let us define the measure Df recursively 
(from our given distances) by using the equation 

Df{S) = f{S)+ max {Df{T)} (1) 

for all vertex-sets S of size greater than two. Let us call the function / suit- 
able if: (a) / is a continuous function of the distances; (b) if any of the 
distances are 0, then / = 0; (c) if none of the distances are 0, then / is 
strictly positive and is a monotonically strictly increasing function of the dis- 
tances; and (d) / is 'scale-invariant' in the sense of Axiom |4l For exam- 

1 

pie, we could choose /({si, S2, • ■ • , Sn}) to be (ni<j<j<« ^({^i, s^})) ^ or 

(Si<j<j<n D{{s^ «■}) ) ' '^^ linear combination of these. We will now see 
that Df satisfies the axioms if / is suitable: 

Theorem 4. The diversity measure Df defined in equation (QJ) satisfies Ax- 
joms[7}|^ if the function f is suitable. 

Proof It is immediately clear by induction that Df satisfies Axioms [IHU so it 
only remains to show that Axiom 1 is satisfied. To do this, we need to prove 
that (i) adding a vertex already in the set does not alter the score, and (ii) 
adding a vertex not already in the set strictly increases the score. 

We shall proceed by induction. Suppose that (i) and (ii) both hold when 
adding a vertex to any set of size less than k (note that the base step is a direct 
consequence of the fact that the distances satisfy the properties of a metric), 
and let us now consider the case when we are adding a vertex s^.^i to a set 
S = {si, S2, . . . , Sfc} of size exactly k. 

First, let us work towards (i) by supposing (without loss of generality) 
that Sfc+i — Sfe. By part (b) of the definition of suitability, we then have 

/(5U{sfc+i}) = OandsoD/(S'U{sfc+i}) = max,<fe+i |D/ ((S- U {sfc+i} ) \ s^) }. 

Hence, it suffices to prove that Df(^(^S U {sk+i}) \ Sij is maximised by taking 

i S {k,k + 1}. But note that, for i < k, the induction hypothesis implies 

Df(^{SU{sk+i}) \ Si) = Df{S\si) < Df{S), and so we are done. 

Now let us work towards (ii) by supposing that Sk+i ^ S. If the vertices of 
S are all distinct, then f{S U {sfe+i}) > and the result is clear. If not, then 
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let S' denote a maximally sized subset of S with vertices that are all distinct. 
By the induction hypothesis, we have Df{S' U {sk+i}) > Df{S'). But note 
that, by a combination of the induction hypothesis and (i), the left-hand side is 
Df{S U {sk+i}) and the right-hand side is Df{S). □ 

One particular choice for a suitable function would be to take 

/({Si,S2,...,S„}) = - , , (2) 

l^l<i<j<n D{{si,Sj}) 

as this has the aesthetically pleasing property that it will always be equal to 
1 for the 'regular' case when the distances are all 1 (and hence Df{S) will be 
equal to liSI — 1 for this case). However, this is a personal choice and is certainly 
not an axiom! 



4. The equidistance axiom 

In the previous section, we saw a scheme for generating diversity measures 
that satisfy Axioms [THU However, as briefly mentioned earlier, there is also a 
fifth axiom that is necessary — that of equidistance. In this section, we shall 
explain why such an axiom is desirable and discuss how to define it. We will start 
with the case when we just have graphs of order three, which we shall see only 
requires a small modiflcation to our previous measures; then we will investigate 
a seemingly natural way to extend the concept to graphs of arbitrary order, 
which we shall see actually produces an intriguing contradiction; and finally we 
will propose a more careful definition. In Section [51 we shall then describe a 
pretty measure that appears to always give nice results. 

Let us imagine that we have two vertices x and y that are distance 1 apart, 
and that we wish to add one more vertex to this set. Suppose that we are free 
to choose any element from {z : D{{x, z}) + D{{y, z}) — 2}. It seems intuitive 
that the overall diversity score ought to be greater the more equidistant the 
new vertex is between x and y. Unfortunately, this is actually not true for 
the diversity measure defined at the end of the last section, where we use the 
suitable function of equation ^ in recursion ([1} , since we know that the regular 
unit triangle scores 2, whereas the triangle with lengths 1, ^ and | scores 
1^2+'^ niax{l, ^,|} = ^ + |>2. This example establishes the need for a 
new axiom: 

Axiom 5. Three-vertex equidistance. Given two sets S — {si,S2,'S3} o,nd 
T = {ti,t2,t3}, tfD{ti,t2) = D{si,S2) and D[ti,ts) + D{t2M) = D{si,S3) + 
D{s2,S3) = A, but \D{ti,t3)-^\ < \D{si,S3)-^\ {and \D(t2,t3) - ^\ < 
|-D(S2,S3) " II ), then D{T) > D{S). 

One way to approach the problem of trying to satisfy Axiom [5] would be to 
find a suitable function / for which the partial derivative with respect to the 
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maximum distance (when the total distance is fixed) is always less than — 1, 
thus offsetting the contribution to D/ of the maxTcS:\T\=2{D{T)} term. 

However, a neater solution is to instead develop a separate diversity measure 
for sets of size three that does satisfy Axiom [S] (and also Axioms [THU) and 
then simply use the recursion of equation ([T|) on this (it can be checked that 
Theorem m will still hold). For example, we could set 



D{{si, S2, S3}) = g{{si, 82,83}) + - ^ D{{8i,8j}), 

l<i<J<3 

where g is any suitable function that also satisfies Axiom [5l It is simple to see 
that D({si, S2, S3}) satisfies Axioms [^HSl and Axiom [1] (for the case when we 
are comparing sets of size three with sets of size two) follows from using the 
triangle inequality on the second term. An aesthetically pleasing choice, if we 
then use recursion ([T]) with the suitable function defined in equation ([2]), is to 



take g({si, S2, S3}) = f (Ei 



l<i<j<3 D{{s,,s,}) 



so that 



Di{8i,82,83}) = - 



E 



, l<i<j<3 



^({s.,s,}) 



i Di{8,,8,}). 



l<i<j<3 



(3) 



Of course, we really need something more general than just the three- vertex 
rule of Axiom [5] For example, consider the two four-vertex sets, S and S", 
depicted in Figure [2l It seems intuitive that S" should be considered as more 




S2 si 
Figure 2: Two four-vertex sets, S and S" 




diverse than S, as the position of S4 is equidistant in relation to si, 82 and S3. 
Unfortunately, the diversity measure that we have just defined gives D{S') ~ 3 
and D{S) = .^,^3%^,^, + D{{8,,82,8,}) = | + |^ + 1 (| + | + l) = 

97 > 3 

One natural way to extend Axiom [5] to any number of vertices is the follow- 
ing: 

'Axiom' 5' Strong equidistance. Given two sets S — {si, 82, ■ ■ ■ , Sn} and 

T = {tiM, ■ ■ ■ ,tn}, if D{ti,tj) = D{8i, Sj) for all i,j<n and J2i<n D(ti,tn) = 
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J2t<n^i^i^^n) = A, but D{ti,tn) 



X 

n-1 



(*) 
< 



D{si,Sn) 



for all i < n, 



then D{T) > D{S), with equality if and only if there is equality in (*) for all 
i < n. 



Unfortunately, as we shall now see, it turns out that this strong version 
actually leads to inconsistencies with our earlier axioms!: 

Theorem 5. 'Axiom' 5' is inconsistent with Axioms[li{3i 

Proof Let the vertex-sets S — {si, S2, S3}, T = {ti, t2, t^} and [/„ — {ui, U2, U3} 
be as shown in Figure [3l By Axiom [21 we have -0(5*) > D{T) and so, by 




continuity (Axiom [3]), there exists a k such that D{S) > D{Uk)- 

Now consider the set C/^ — {wi, M2, . . . , Ufe+2} D Uk constructed from Uk 
by setting D{{u2,ui}) — for all I > 4 (i.e. adding in /c — 1 extra copies of 
U2) and, similarly, the set S" — {si, S2, . . . , Sfe+2} constructed from S by setting 
D{{s2, si}) = for alH > 4 (i.e. adding in /c — 1 extra copies of S2). 

If we assume the strong equidistance of 'Axiom' 5', then D{U'^) > D{S'). 
But, by Axiom m D{U'^) = D{Uk) and D(S") = D{S). Hence, we find that 
D{Uk) > D{S), and so we have a contradiction. □ 

Note that (with a bit of care to ensure that the triangle inequality is not 
violated during the proof) a form of this example still produces a contradic- 
tion even if we alter the strong equidistance 'axiom' to include the condition 
D{t.i,tn) = D{si,Sn) for ah z > 3 as weU as I]i<„ ^(^^^ = I]i<,i ^(si, s„). 
This seems very surprising! 

However, let us now recall our original four- vertex example of Figure [H The 
intuition that it was desirable for the fourth vertex to be equidistant in relation 
to Si, S2 and S3 perhaps stems from the fact that these other three vertices 
were all in symmetric positions to begin with. Hence, it seems sensible that a 
satisfactory equidistance axiom should have to take into account such symmetry 
considerations (note that this was not necessary for the three- vertex case, since 
every set of size two is automatically symmetric). 

Another fault with 'Axiom' 5' is that, by only considering the case when we 
wish to add a new vertex, it perhaps does not cover all desirable situations. For 
example, consider the sets S and T shown in Figure ID It seems intuitive that 
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Figure 4: Two sets S and T. 



T should be thought of as the more diverse, since the edges tit2 and tst^ are in 
symmetric positions, but this would not have been covered by 'Axiom' 5'. 

Hence, it seems desirable to formulate exactly what we mean by 'symmetry' 
before we attempt to propose another equidistance axiom. With this in mind, 
we now give two definitions: 

Definition 6. Let G he called a partial graph if it can be formed from an un- 
labelled edge-weighted complete graph by deleting some of the edge-weights and, 
instead, giving (distinct) labels to the associated edges. 

For example, the graphs Gi and G2 shown in Figure [S] are two of the partial 
graphs that can be formed from (the unlabelled version of) the graph represent- 
ing the set S in Figure H] 




65 
Figure 5: Partial graphs Gi and G2. 



Definition 7. Given a partial graph G whose labelled edges are ei, 62, . . . , ek, let 

us say that Ci is symmetric to e; in G if there exists a permutation e^^ , , . . . , e^^ 
of the labels for which ii — I and relabelling Cj as e,;^ for all j does not change G. 

For example, it can be seen that ei is symmetric to 62 in G2 (indeed, all 
edges are symmetric here) by considering the permutation shown in Figure [6l 
In Gi, however, ei and 62 are not symmetric. 

It can be checked that, for each partial graph, symmetry defines an equiva- 
lence relation on the labelled edges. 
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G2 




Figure 6: A permutation of the edges of G2- 



Now that we have defined symmetry, we will return to the issue of formulat- 
ing a successful equidistance axiom. Our new version comes in two parts, the 
first of which we present immediately: 

Axiom 5"a Symmetric equidistance, part one. Let Gs be an unlabelled edge- 
weighted complete graph representing the set S , and let Gp he a partial graph 
that can be formed from Gs- Let us denote the equivalence classes of the la- 
belled edges in Gp by Ei = {en, ei2, . . . , eu J, E2 = {e2i, 622, 62/02}, ... , 
and El = {en, ei2, . . . , ej^, }. Now let Gt be the graph formed from Gp by setting 
wcri^ij) — r" X^eG-E ^Gsi^) 1°''" * '^'^^ j ft-^- averaging out the weights of 
all the edges in each equivalence class) and let T be the set represented by Gt (it 
can be checked that Gt will satisfy the triangle inequality). Then D(T) > D{S), 
with equality if and only if T — S . 

For example. Axiom 5"a could be applied to the graphs Gs, Gp and Gt 
shown in Figure [71 



It is a deliberate decision to only state Axiom 5"a for the case when the edges 
in each equivalence class are completely evenly weighted in Gt, rather than just 
more evenly weighted than in Gs, in order to take care that the axiom is not 
stronger than our intuition. For example, consider the sets S, T and T' depicted 
in Figure |51 It certainly seems reasonable to say that T should be considered 
more diverse than S (and this can be deduced by applying Axiom 5"a to the 
relevant four edges), but, in the light of Theorem [5l it is perhaps going too far 
to claim that T' should also definitely be considered more diverse than S (and. 




1 



1 



1 



Figure 7: An example of Axiom 5"a. 
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indeed, this is not implied by Axiom 5"a). 

However, in the case when we are dealing with an equivalence class contain- 
ing just two edges, a stronger formulation does seem desirable, and this is given 
in the second part of our axiom: 

Axiom 5"b Symmetric equidistance, part two. Let Gs be an unlabelled edge- 
weighted complete graph representing the set S , and let Gp he a partial graph 
formed from Gs by deleting exactly two of the edge-weights and, instead, la- 
helling the associated edges as ei and 62- Suppose ei and 62 are symmetric in 
Gp. Let us define X := WGs{si)-\- WGs (^2), let Gt he any graph formed from Gp 
by setting wcTi^i) wgt{^2) to he values satisfying WGri^i) +wgt{g2) = A 
and \wGT{ei) - ^\ < |wGs(ei) ~ || {and \wGTie2) - ^\ < \wGsi(^2) - ^\ ), 
and let T he the set represented by Gt ( it can be checked that T will satisfy the 
triangle inequality) . Then D(T) > D{S). 



For example. Axiom 5"b could be applied to the graphs Gs, Gp and Gt 
shown in Figure[9]to give D{T) > D{S). 




Ill 

Figure 9: An example of Axiom 5"b. 



Note that we can sometimes obtain useful results by applying Axiom 5"6 
successively to different edges. For example, a second application of the axiom 
could have been used in the previous case to obtain D{T') > D{S), where T' is 
as depicted in Figure ITOl 
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1 

Figure 10: A second application of Axiom 5"b. 

5. A perfect diversity measure? 

In the previous section, we argued tire case for a new 'symmetric equidis- 
tance' axiom. Unfortunately, the intricate nature of this seems to make it ex- 
tremely difficult to find a satisfactory diversity measure. In particular, the 
measures suggested in Section [3] seem irreparably distorted by the max{_D(T)} 
term, which was critical for satisfying Axiom [TJ However, in this section we 
shall present a possible alternative that appears to give nice results without 
employing such an expression. 

1 

Given a set S* = {si, S2, . . . , s„}, let pki = t-^ — ^ for all k ^ I, 

and define D recursively by the equation 

D{S)= J2 Pki{D{{sk,m}) + D{Ski)), (4) 

l<k<l<n 

where 5*^; denotes the set formed from S by 'merging' Sk and s; into a new 
vertex Ski and setting D{{skh Si}) = '"^^'''"'''"^^2 ^^^"'''''"^^ for all i (it can be 
checked that the distances will still satisfy the properties of a metric). 

For example, consider the set S = {si, 32,53} illustrated in Figure [TTl for 
which the sets 5*12, S13 and S23 are also depicted. Here, we would have pi2 = 



S3 S3 




Sl 4 S2 S12 S2 Si 



Figure 11: The set S and the three 'merged' sets S12, S13 and 523- 

1 1 = and, similarly, pis = ^ and P23 = Hence, we would obtain 
D{S) L ^ (4 + I) + A (2 + I) + ^(3 + 3) = ^. 

For sets of size three, this method simplifies to a nice formula: 
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Theorem 8. The diversity measure D defined in equation ^ satisfies the for- 
mula 

^ \l<j<j<3 l<i<J<3 

Proof By equation (|4]), we have 

L'({S1,S2,S3}) = Pl2\D{{si,S2})^ 



+P13 £'({51,53}) + 



+P23 (^£'({52,53}) + 
J3l2£'({si,S2}) , P12 



2 

D{{SI,S2})+D{{S2,SZ}) 

2 

£({si,S2})+i?({si,S3}) 



2 ^({^-^j}) 

l<i<j<3 



P13Z)({S1,S2}) , Pl2 
+ 2 ^~ ^ D({Si,Sj}) 

i<*<i<3 

, P23Di{si,S2}) , P12 ST nfs w 

l<i<j<3 

Pl2D{{si,S2}) , pi3i:'({si,S2}) , P23-D({S1,S2}) 



2 2 

2 



l<i<j<3 

since pi2 + P13 + P23 ^ 1 



2 ^ ^({s 
by definition of pki ■ D 

Note tliat this is the same expression as that given in equation ([3]) on page [71 
and so this measure certainly satisfies all the axioms when a set has size three. 

The equations produced for larger sets are much more complicated, and 
hence more difficult to analyse, but the method appears to always give nice 
results. For example, with the sets depicted in Figure [2] on page [71 we would 
obtain 

+^(1 + £(^23)) + H niS2.)) + ^(^3.; 

where 5i2, ^13, 514, 'S'23, 'S'24 and 6*34 are as shown in Figure[T! 
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1 17 \ 2 3/59 

2 ■ Y ) ^ 15 I ^ 2 I 20 



J_ / 4 3/47 
^10 1^3 ^ 2 lyli 

using Theorem |8] 
« 2.838, 

whereas it is simple to see (by induction) that D{S') = 3 > D{S), as required 







Figure 12: The 'merged' sets 5i2, 5i3, S14, S23, S24 and 534- 

by Axiom 5"a. 

Unfortunately, it seems difficuh to prove that the measure always satisfies 
Axioms 1 and 2, let alone the new equidistance axioms. All experimental re- 
sults have been positive, however, and so it is very much hoped that other 
mathematicians will explore this measure further. 



6. Concluding remarks 

Although many of the properties required of a sensible diversity measure 
seem simple, we have seen that it is not easy to produce one. In particular, 
the requirement for such a measure to take into account complicated symmetry 
considerations (and the fact that it it is not obvious precisely what these should 
be) seems to make the problem rather difficult. Nevertheless, it is hoped that the 
ideas presented in this paper have gone some way towards building a rigorous 
framework for diversity, and developing a measure that is truly satisfactory. 
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